Re-Envisioning Data Description Using Peirce's Pragmatics

نویسندگان

  • Mark Gahegan
  • Benjamin Adams
چکیده

Given the growth in geographical data production, and the various mandates to make sharing of data a priority, there is a pressing need to facilitate the appropriate uptake and reuse of geographical data. However, describing the meaning and quality of data and thus finding data to fit a specific need remain as open problems, despite much research on these themes over many years. We have strong metadata standards for describing facts about data, and ontologies to describe semantic relationships among data, but these do not yet provide a viable basis on which to describe and share data reliably. We contend that one reason for this is the highly contextual and situated nature of geographic data, something that current models do not capture well – and yet they could. We show in this paper that a reconceptualization of geographical information in terms of Peirce’s Pragmatics (specifically firstness, secondness and thirdness) can provide the necessary modeling power for representing situations of data use and data production, and for recognizing that we do not all see and understand in the same way. This in turn provides additional dimensions by which intentions and purpose can be brought into the representation of geographical data. Doing so does not solve all problems related to sharing meaning, but it gives us more to work with. Practically speaking, enlarging the focus from data model descriptions to descriptions of the pragmatics of the data – community, task, and domain semantics – allows us to describe the how, who, and why of data. These pragmatics offer a mechanism to differentiate between the perceived meanings of data as seen by different users, specifically in our examples herein between producers and consumers. Formally, we propose a generative graphical model for geographic data production through pragmatic description spaces and a pragmatic data description relation. As a simple demonstration of viability, we also show how this model can be used to learn knowledge about the community, the tasks undertaken, and even domain categories, from text descriptions of data and use-cases that are currently available. We show that the knowledge we gain can be used to improve our ability to find fit-for-purpose data. 1 Rethinking the way we describe geographic data Our efforts to create better geographic data models and communicate richer data descriptions have led to very fruitful avenues of research, such as the representation of semantics, the visualization of uncertainty, the propagation of error, and others [43, 18, 41, 26, 21, 28]. The era of volunteered geographic information (VGI) further complicates the picture with new challenges for understanding spatial data meaning, accuracy, and quality [19, 1]. Research to date may allow us to describe the quality (or perhaps even the semantics) of a single dataset, with effort, but we cannot propagate – with suitable modification – this information into derived products. Thus the onus remains firmly on the data producer to document quality and meaning of every new dataset. This has never been sustainable; most datasets do not have comprehensive quality information at the level of sophistication that consumers need. It is even less sustainable in the era of VGI and mash-ups, where more data is combined in hitherto unanticipated ways than ever before. Furthermore, there is a real danger that all these different research strands have moved us further away from the actual problem, of describing these important aspects of data in an integrated and combinable manner, for example so that they can be used together in a query to find useful data. Without a way to bring these threads back together, our fruitful research avenues are in danger of becoming cul-de-sacs. Our approach to modeling geographic data is drastically in need of an overhaul. Finally, as a community, we have been guilty of concentrating too heavily on the perspective of the data producer: describing ‘facts’ about data, but not acknowledging the tacit world-view that can render these ‘facts’ true and useful (or not) within a given context. Knowing which ‘facts’ remain true when the context is changed, and also which facts remain relevant are both key to describing geographical data better. We term this idea the pragmatics of data, after Peirce [32]. 1.1 An alternative approach We suggest the following five propositions offer an alternative way forward: 1. We do not know what the eventual user will need to know about the data they wish to use, and we cannot know, in advance, the likely utility of any of the descriptions we may strive to add as data producers (such as ontologies, workflows, and accuracy assessments). And despite the huge volume of work published on conceptual geographic data models, we are no closer to knowing which ones have lasting value. We need empirical evidence, not more rhetoric, to produce a better model. 2. Consequently, we deliberately move away from the search for a single, definitive conceptual model of geographical data, and propose instead a meta-model where we can evaluate the actual utility of various forms of descriptions, from the perspective of specific tasks and research needs, using evidence gathered from actual use-cases. 3. We propose this simple meta-model as a set of description spaces, each comprised as ‘facets’, that represent themes that we believe may have utility–but we 1 Including work published by the authors of this paper! do not claim that these are either necessary or sufficient–they are rather a place to begin. Within these facets we measure compliance to some kind of desired ‘optimal’ state–as simply as we can (see section 2). Again, we make no claim that these facets are right, rather that they may prove to be useful under evaluation and (hopefully) that they are simple enough to be assigned and read with ease. 4. We broaden the scope of data description to consider the perspective of the data consumer. So we begin by asking: ‘What kinds of things might a consumer of the data want to know?’ Rather than: ‘What kinds of things might a producer of the data be persuaded to say?’ Furthermore, current approaches emphasize the where, when, and what aspects of data, with various degrees of success and completeness, but often leave aside the deeper questions of who, how, and why. These questions carry much meaning for a potential consumer of the information (they speak to reputation, quality, and motivation). We believe there are aspects of these deeper questions that can be captured that allow us to start framing more practical (and answerable) questions that often substitute for deeper ontological and epistemological questions: e.g. ‘for what task did you make this data?’ can act as a surrogate for: ‘what does this data mean to you?’ or ‘which organization produced the data?’ may in some circumstances substitute for ‘what is the likely quality of the data?’ These substitutions are certainly not perfect, but in a Bayesian sense they are better than nothing; and what’s more, we can readily compute the degree to which they help elucidate the pragmatics we seek, as we show in Section 4. 5. The benefits of such an approach are many: (i) descriptive facets can be added or retracted according to need; (ii) the system could learn over time which kinds of data descriptions are most useful, so that data producers can focus their efforts when creating time-consuming data descriptions; (iii) multiple perspectives onto the meaning and use of data can be supported concurrently–allowing for the natural fact that we do not all see the world in the same way; (iv) shifting the emphasis from producing more metadata to learning from use-cases lifts an unmanageable burden from the data producers; (v) the conceptual model is not now a fixed thing, but can grow or change as new needs arise, as we learn more about which facets offer the most useful descriptions of data, or as new computational technologies provide us with additional descriptive facets. The following are some of the many important facets to describe, though of course not an exhaustive list: – Data Model: What/when/where is it? • Spatio-temporal Frameworks (spatio-temporal schema & semantics) • Attribute Schema & Semantics – Process: How was it made and thus how confident are we in it? • Quality (Accuracy & Uncertainty) • Provenance (lineage) – Community: Who can/should use it? Why was it made? • Motivation • Access and licensing • Authority (Governance & Trustworthiness)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Distinctions between Semantics and Pragmatics

The term pragmatics was proposed by Morris in 1938 as a tribute to C. S. Peirce's philosophy of pragmatism, to designate the study of signs and their relationship to interpreters. In 1946 Morris changed this slightly to make pragmatics the study of the origin, use and effect of signs. One of the main differences between the two versions is that in the second version the term use also includes t...

متن کامل

Conclusion : Re - evaluating Eclecticism

When looking at the original data analysis, we see a variety of approaches used to examine the discourse data of focus. The analysis is rich and includes a wide array of features. Conversely, the three single perspective analyses conducted for this Forum each drew upon different linguistic details to support their conclusions with different insights. It is important to consider how different ap...

متن کامل

Dynamics and Pragmatics of 'Peirce's Puzzle'

An intriguing puzzle due to Charles Sanders Peirce (Peirce 1906) has recently regained the interest of semanticists. It has been argued that the puzzle should be analyzed by means of a dynamic or E-type analysis of non-bound pronouns. In this paper we will rst argue that \Peirce's Puzzle", basically, doesn't have anything to do with non-bound pronouns and that, consequently, a dynamic or E-type...

متن کامل

Technology and the Future of Dispute Systems Design

A. The First Decade of Online Dispute Resolution: Some Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . 169 R 1. eBay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 R 2. Cybersettle and SmartSettle . . . . . . . . . . . . . . . . . . 175 R 3. The Mediation Room . . . . . . . . . . . . . . . . . . . . . . . . . . 177 R B. The Second Dec...

متن کامل

An overview of the NFAIS 2014 Annual Conference - Giving Voice to Content: Re-Envisioning the Business of Information

This paper provides an overview of the highlights of the 2014 NFAIS Annual Conference, Giving Voice to Content: Re-Envisioning the Business of Information, that was held in Philadelphia, PA, on February 23–25, 2014. The goal of the conference was to take a look at the opportunities offered by current Big Data technologies (metrics, analytics, data mining, visualization, linking, etc.) and chall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014